{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Basic Idea of Reinforcement Learning \n",
    "\n",
    "\n",
    "\n",
    "Let's begin with an analogy. Let's suppose we are teaching a dog (agent) to catch a\n",
    "ball. Instead of teaching the dog explicitly to catch a ball, we just throw a ball and\n",
    "every time the dog catches the ball, we give the dog a cookie (reward). If the dog\n",
    "fails to catch the ball, then we do not give it a cookie. So, the dog will figure out\n",
    "what action caused it to receive a cookie and repeat that action. Thus, the dog will\n",
    "understand that catching the ball caused it to receive a cookie and will attempt to\n",
    "repeat catching the ball. Thus, in this way, the dog will learn to catch a ball while\n",
    "aiming to maximize the cookies it can receive.\n",
    "\n",
    "Similarly, in an RL setting, we will not teach the agent what to do or how to do it;\n",
    "instead, we will give a reward to the agent for every action it does. We will give\n",
    "a positive reward to the agent when it performs a good action and we will give a\n",
    "negative reward to the agent when it performs a bad action. The agent begins by\n",
    "performing a random action and if the action is good, we then give the agent a\n",
    "positive reward so that the agent understands it has performed a good action and it\n",
    "will repeat that action. If the action performed by the agent is bad, then we will give\n",
    "the agent a negative reward so that the agent will understand it has performed a bad\n",
    "action and it will not repeat that action.\n",
    "\n",
    "Thus, RL can be viewed as a trial and error learning process where the agent tries out\n",
    "different actions and learns the good action, which gives a positive reward.\n",
    "\n",
    "In the dog analogy, the dog represents the agent, and giving a cookie to the dog\n",
    "upon it catching the ball is a positive reward and not giving a cookie is a negative\n",
    "reward. So, the dog (agent) explores different actions, which are catching the ball\n",
    "and not catching the ball, and understands that catching the ball is a good action as it\n",
    "brings the dog a positive reward (getting a cookie).\n",
    "\n",
    "\n",
    "Let's further explore the idea of RL with one more simple example. Let's suppose we\n",
    "want to teach a robot (agent) to walk without hitting a mountain, as the following figure shows: \n",
    "\n",
    "![title](Images/1.png)\n",
    "\n",
    "We will not teach the robot explicitly to not go in the direction of the mountain.\n",
    "Instead, if the robot hits the mountain and gets stuck, we give the robot a negative\n",
    "reward, say -1. So, the robot will understand that hitting the mountain is the wrong\n",
    "action, and it will not repeat that action:\n",
    "\n",
    "\n",
    "![title](Images/2.png)\n",
    "\n",
    "Similarly, when the robot walks in the right direction without hitting the mountain,\n",
    "we give the robot a positive reward, say +1. So, the robot will understand that not\n",
    "hitting the mountain is a good action, and it will repeat that action:\n",
    "\n",
    "![title](Images/3.png)\n",
    "\n",
    "Thus, in the RL setting, the agent explores different actions and learns the best action\n",
    "based on the reward it gets.\n",
    "Now that we have a basic idea of how RL works, in the upcoming sections, we will\n",
    "go into more detail and also learn the important concepts involved in RL."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
